Skip to content

Conversation

Li-Z-Q
Copy link
Contributor

@Li-Z-Q Li-Z-Q commented Jul 17, 2025

  1. 支持量化代码返回 hidden_states
  2. 支持针对向量模型进行量化加载,包括 weight_only_int8,weight_only_int4 两种方式
  3. 支持向量模型量化加载时仅预分配第一层 kv_cache 并在后续计算时进行复用,从而降低显存占用

Copy link

paddle-bot bot commented Jul 17, 2025

Thanks for your contribution!

@@ -1481,7 +1481,7 @@ def forward(
self.pre_process(**kwargs)
kwargs["cum_offsets"] = cum_offsets

if caches is not None:
if caches is not None and not kwargs["kv_cache_reuse"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个位置需要判断是否存在kv_cache_reuse,如果不存在给默认值

@Liujie0926
Copy link
Collaborator

[PaddleNLP-CI]任务执行失败,手动验证发现pr代码执行grpo的case会报错。手动复现命令如下:
cd PaddleNLP/llm/alignment/rl/
python reward/reward_server.py >log_reward 2>&1 & #服务启动
export PYTHONPATH=PaddleNLP/:$PYTHONPATH
export PYTHONPATH=PaddleNLP/llm:$PYTHONPATH
python -u -m paddle.distributed.launch --devices "0,1,2,3" run_rl.py ../../config/qwen/grpo_argument.yaml

报错信息
Traceback (most recent call last):
File "/workspace/PaddleNLP/llm/alignment/rl/run_rl.py", line 453, in
main()
File "/workspace/PaddleNLP/llm/alignment/rl/run_rl.py", line 434, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/workspace/PaddleNLP/paddlenlp/rl/trainer/ppo_trainer.py", line 1397, in train
generated_batches: List[DataProto] = self.actor_trainer.generate_sequences(
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/rl/trainer/actor_trainer.py", line 414, in generate_sequences
sequences = self.get_model(False).generate(
File "/workspace/PaddleNLP/paddlenlp/rl/utils/infer_utils.py", line 315, in generate
outputs = policy_predictor.predict(input_ids=input_ids, repeat_num=repeat_num, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/rl/utils/infer_utils.py", line 98, in predict
outputs = self.predict_dy_insert(
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/utils/import_utils.py", line 105, in wrapper
return func(self, *args, **kwargs)
File "/workspace/PaddleNLP/llm/predict/predictor.py", line 1525, in predict_dy_insert
self._infer(self.model_inputs)
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/llm/predict/predictor.py", line 1172, in _infer
return self.model.generate(
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function
return func(*args, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 669, in generate
ret = self.sample(
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 777, in sample
outputs = forward(**model_kwargs) # [bs, 1, dim_embed]
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 692, in forward
return self(**model_inputs)
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/qwen2/modeling.py", line 1553, in forward
hidden_states, full_hidden_states = self.qwen2(
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/qwen2/modeling.py", line 1327, in forward
hidden_states, full_hidden_states = self.transformer_block(
File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call
return self.forward(*inputs, **kwargs)
File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/fused_transformer_layers.py", line 1486, in forward
assert len(caches) == len(self.linear_weights) or len(caches) == 2 * len(self.linear_weights)
AssertionError

@Liujie0926
Copy link
Collaborator

Test任务网络问题已修复,辛苦merge下develop代码

@Li-Z-Q
Copy link
Contributor Author

Li-Z-Q commented Aug 21, 2025

Test任务网络问题已修复,辛苦merge下develop代码

已merge

@Li-Z-Q
Copy link
Contributor Author

Li-Z-Q commented Aug 21, 2025

[PaddleNLP-CI]任务执行失败,手动验证发现pr代码执行grpo的case会报错。手动复现命令如下: cd PaddleNLP/llm/alignment/rl/ python reward/reward_server.py >log_reward 2>&1 & #服务启动 export PYTHONPATH=PaddleNLP/:$PYTHONPATH export PYTHONPATH=PaddleNLP/llm:$PYTHONPATH python -u -m paddle.distributed.launch --devices "0,1,2,3" run_rl.py ../../config/qwen/grpo_argument.yaml

报错信息 Traceback (most recent call last): File "/workspace/PaddleNLP/llm/alignment/rl/run_rl.py", line 453, in main() File "/workspace/PaddleNLP/llm/alignment/rl/run_rl.py", line 434, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/workspace/PaddleNLP/paddlenlp/rl/trainer/ppo_trainer.py", line 1397, in train generated_batches: List[DataProto] = self.actor_trainer.generate_sequences( File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/paddlenlp/rl/trainer/actor_trainer.py", line 414, in generate_sequences sequences = self.get_model(False).generate( File "/workspace/PaddleNLP/paddlenlp/rl/utils/infer_utils.py", line 315, in generate outputs = policy_predictor.predict(input_ids=input_ids, repeat_num=repeat_num, **kwargs) File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/paddlenlp/rl/utils/infer_utils.py", line 98, in predict outputs = self.predict_dy_insert( File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/paddlenlp/utils/import_utils.py", line 105, in wrapper return func(self, *args, **kwargs) File "/workspace/PaddleNLP/llm/predict/predictor.py", line 1525, in predict_dy_insert self._infer(self.model_inputs) File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/llm/predict/predictor.py", line 1172, in _infer return self.model.generate( File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 396, in _decorate_function return func(*args, **kwargs) File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 669, in generate ret = self.sample( File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 777, in sample outputs = forward(**model_kwargs) # [bs, 1, dim_embed] File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/generation_utils.py", line 692, in forward return self(**model_inputs) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call return self.forward(*inputs, **kwargs) File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/qwen2/modeling.py", line 1553, in forward hidden_states, full_hidden_states = self.qwen2( File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call return self.forward(*inputs, **kwargs) File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/qwen2/modeling.py", line 1327, in forward hidden_states, full_hidden_states = self.transformer_block( File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1571, in call return self.forward(*inputs, **kwargs) File "/workspace/PaddleNLP/paddlenlp/experimental/transformers/fused_transformer_layers.py", line 1486, in forward assert len(caches) == len(self.linear_weights) or len(caches) == 2 * len(self.linear_weights) AssertionError

已通过修改kv_cache_reuse默认值进行修复

Copy link
Collaborator

@DrownFish19 DrownFish19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DrownFish19 DrownFish19 merged commit c83684f into PaddlePaddle:develop Aug 21, 2025
9 of 10 checks passed
pkuzyc pushed a commit to pkuzyc/PaddleNLP that referenced this pull request Aug 29, 2025
* fix hidden states

* fix quant kv_cache

* fix Lint style

* fix kv_cache_reuse key error

* fix kv_cache_reuse key error

* remove unused code

* fix kv_cache_reuse default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants